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ABSTRACT 

We  introduce  a  class  of  linear  quantile  regression 
estimators  for  panel  data.  Our  framework  contains 
dynamic  autoregressive  models,  models  with  general 
predetermined  regressors,  and  models  with  multiple 
individual  effects  as  special  cases.  We  follow  a 
correlated  random-effects  approach,  and  rely  on 
additional  layers  of  quantile  regressions  as  a  flexible 
tool  to  model  conditional  distributions.  Conditions  are 
given  under  which  the  model  is  nonparametrically 
identified  in  static  or  Markovian  dynamic  models.  We 
develop  a  sequential  method-of-moment  approach  for 
estimation,  and  compute  the  estimator  using  an 
iterative  algorithm  that  exploits  the  computational 
simplicity  of  ordinary  quantile  regression  in  each 
iteration  step.  Finally,  a  Monte-Carlo  exercise  and  an 
application  to  measure  the  effect  of  smoking  during 
pregnancy  on  children’s  birthweights  complete  the 
paper. 

K-means  and  K-medoids  clustering  algorithms  are 
widely  used  for  many  practical  applications.  Original 
k-mean  and  k-medoids  algorithms  select  initial 
centroids  and  medoids  randomly  that  affect  the 
quality  of  the  resulting  clusters  and  sometimes  it 
generates  unstable  and  empty  clusters  which  are 
meaningless.  The  original  k-means  and  k-mediods 
algorithm  is  computationally  expensive  and  requires 
time  proportional  to  the  product  of  the  number  of  data 
items,  number  of  clusters  and  the  number  of 
iterations.  The  new  approach  for  the  k  mean  algorithm 
eliminates  the  deficiency  of  exiting  k  mean.  It  first 
calculates  the  initial  centroids  k  as  per  requirements  of 
users  and  then  gives  better,  effective  and  stable 
cluster.  It  also  takes  less  execution  time  because  it 
eliminates  unnecessary  distance  computation  by  using 
previous  iteration.  The  new  approach  for  k-  medoids 
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selects  initial  k  medoids  systematically  based  on 
initial  centroids.  It  generates  stable  clusters  to 
improve  accuracy. 
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INTRODUCTION 

Nonlinear  panel  data  models  are  central  to  applied 
research.  However,  despite  some  recent  progress,  it  is 
fair  to  say  that  we  are  still  short  of  answers  for  panel 
versions  of  many  models  commonly  used  in  empirical 
work.1  In  this  paper  we  focus  on  one  particular 
nonlinear  model  for  panel  data:  quantile  regression. 

Since  Koenker  and  Bassett  (1978),  quantile  regression 
has  become  a  prominent  methodol-ogy  for  examining 
the  effects  of  explanatory  variables  across  the  entire 
outcome  distribution.  Extending  the  quantile 
regression  approach  to  panel  data  has  proven 
challenging,  however,  mostly  because  of  the  difficulty 
to  handle  individual-specific  heterogeneity.  Starting 
with  Koenker  (2004),  most  panel  data  approaches  to 
date  proceed  in  a  quantile-by-quantile  fash-ion,  and 
include  individual  dummies  as  additional  covariates  in 
the  quantile  regression.  As  shown  by  some  recent 
work,  however,  this  fixed-effects  approach  faces 
special  challenges  when  applied  to  quantile 
regression.  Galvao,  Kato  and  Montes-Rojas  (2012) 
develop  the  large-N,  T  analysis  of  the  fixed-effects 
quantile  regression  estimator,  and  show  that  it  may 
suffer  from  large  asymptotic  biases.  Rosen  (2010) 
shows  that  the  fixed-effects  model  for  a  single 
quantile  is  not  point-identified.2 
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Q  (Yit  |  X;,  r)i,  x )  =  Xjt  p  (x)  +  qiY(x), 

for  all  x  G  (0,  1).  (1) 

We  depart  from  the  previous  literature  by  proposing  a 
random-effects  approach  for  quantile  models  from 
panel  data.  This  approach  treats  individual 
unobserved  heterogeneity  as  time-invariant  missing 
data.  To  describe  the  model,  let  i  =  1,  ...,  N  denote 
individual  units,  and  let  t  =  1,  ...,  T  denote  time 
periods.  The  random-effects  quantile  regression 
(REQR)  model  specifies  the  x  -specific  conditional 
quantile  of  an  outcome  variable  Yit,  given  a  se-quence 
of  strictly  exogenous  covariates  X;  =  (Xn,  ...,  X;x  ) 
and  unobserved  heterogeneity  q;,  as  follows: 

Note  that  q,  does  not  depend  on  the  percentile  value  x. 
Were  data  on  q;  available,  one  could  use  a  standard 
quantile  regression  package  to  recover  the  parameters 
P  (x  )  and  y  (x  ). 

Model  (1)  specifies  the  conditional  distribution  of  Y;t 
given  X;t  and  q,.  In  order  to  complete  the  model,  we 
also  specify  the  conditional  distribution  of  q;  given  the 
sequence  of  covariates  X;.  For  this  purpose,  we 
introduce  an  additional  layer  of  quantile  regression 
and  specify  the  x  -th  conditional  quantile  of  q,  given 
covariates  as  follows: 

This  modelling  allows  for  a  flexible  conditioning  on 
strictly  exogenous  regressors — and  on  initial 
conditions  in  dynamic  settings — that  may  also  be  of 
interest  in  other  panel  data  models.  Together, 
equations  (l)-(2)  provide  a  fully  specified 
semiparametric  model  for  the  joint  distribution  of 
outcomes  given  the  sequence  of  strictly  exogenous 
covariates.  The  aim  is  then  to  recover  the  model’s 
parameters:  P  (x  ),  y  (x ),  and  8  (x ),  for  all  x 

Our  identification  result  for  the  REQR  model  is 
nonparametric.  In  particular,  identification  holds  even 
if  the  conditional  distribution  of  individual  effects  is 
left  unrestricted.  Recent  research  has  emphasized  the 
identification  content  of  nonlinear  panel  data  models 
with  continuous  outcomes  (Bonhomme,  2012),  as 
opposed  to  discrete  outcomes  models  where 
parameters  of  interest  are  typically  set-identified 
(Honor 'e  and  Tamer,  2006,  Chemozhukov, 
Fem'andez-Val,  Hahn  and  Newey,  2011).  Pursuing 
this  line  of  research,  our  analysis  provides  conditions 
for  nonparametric  identification  of  REQR  in  panels 
where  the  number  of  time  periods  T  is  fixed,  possibly 
very  short  (e.g.,  T  =  3).  One  of  the  required  conditions 
to  apply  Hu  and  Schennach  (2008)’s  result  is  a 


completeness  assumption.  Although  completeness  is  a 
high-level  assumption,  recent  papers  have  provided 
primitive  conditions  in  specific  models,  including  a 
special  case  of  model  (1). 


Q  (r|i  |  Xj,  x  )  =  Xi'8(x  ),  for  all  x  e  (0,  1). 


Our  analysis  is  most  closely  related  to  Wei  and 
Carroll  (2009),  who  proposed  a  con-sistent  estimation 
method  for  cross-sectional  linear  quantile  regression 
subject  to  covariate  measurement  error.  In  particular, 
we  rely  on  the  approach  in  Wei  and  Carroll  to  deal 
with  the  continuum  of  model  parameters  indexed  by  x 
G  (0,  1).  As  keeping  track  of  all  parameters  in  the 
algorithm  is  not  feasible,  we  build  on  their  insight  and 
use  interpolating  splines  to  combine  the  quantile- 
specific  parameters  in  (l)-(2)  into  a  complete 
likelihood  function  that  depends  on  a  finite  number  of 
parameters.  Our  proof  of  consistency — in  a  panel  data 
asymp-totics  where  N  tends  to  infinity  and  T  is  kept 
fixed — also  builds  on  theirs.  As  the  sample  size 
increases,  the  number  of  knots,  and  hence  the 
accuracy  of  the  spline  approximation,  increase  as 
well.  A  key  difference  with  Wei  and  Carroll  is  that,  in 
our  setup,  the  conditional  distribution  of  individual 
effects  is  unknown,  and  needs  to  be  estimated  along 
with  the  other  parameters  of  the  model. 

2.  Model  and  identification 

In  this  section  and  the  next  we  focus  on  the  static 
version  of  the  random-effects  quantile  regression 
(REQR)  model.  Section  6  will  consider  various 
extensions  to  dynamic  models.  We  start  by  presenting 
the  model  along  with  several  examples,  and  then 
provide  conditions  for  nonparametric  identification. 

2.1. Model 

Let  Yi  =  (Yu,  ...,  Y;t  )  denote  a  sequence  of  T  scalar 
outcomes  for  individual  i,  and  let  X;  =  (Xn,  ...,  X;T  ) 
denote  a  sequence  of  strictly  exogenous  regressors, 
which  may  contain  a  constant.  In  addition,  let  q, 
denote  a  q-dimensional  vector  of  individual-specific 
effects,  and  let  Ujt  denote  a  scalar  error  term.  The 
model  specifies  the  conditional  quantile  response 
function  of  Yit  given  Xit  and  q,  as  follows: 
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Y;t  =  Qy  (Xit,  -Hi,  Uit) 
i  =  1, N, 

t  =  1,  •••,  T . (3) 

We  make  the  following  assumptions. 

Assumption  1  (outcomes) 

(i)  Uit  follows  a  standard  uniform  distribution 
conditional  on  X;  and  r\i. 

(ii)  x  7— »Q  (x,  r|,  t  )  is  strictly 
increasing  on  (0,  1),  almost  surely  in  (x,  r|). 

(iii)  U;t  is  independent  of  Ujs  for  each  t  =6  s 
conditional  on  X;  and  r|i. 

Assumption  1  (i)  contains  two  parts.  First,  U;t  is 
assumed  independent  of  the  full  se-quence  Xu, ...,  Xu 
and  independent  of  individual  effects.  This 
assumption  of  strict  exo-geneity  rules  out 
predetermined  or  endogenous  covariates.  Second,  the 
marginal  distribution  of  Ult  is  normalized  to  be 
uniform  on  the  unit  interval.  Part  (ii)  guarantees  that 
outcomes. 

2.2  Identification 

In  this  section  we  study  nonparametric  identification 
in  model  (3)-(4).  We  start  with  the  case  where  there  is 
a  single  scalar  individual  effect  (i.e.,  q  =  dim  q;  =  1), 
and  we  set  T  =  3. 

Under  conditional  independence  over  time — 
Assumption  1  (iii) — we  have,  for  all  yi,  y2,  y3, 

x  =  (x  i,  x'2,  x  3)',  and  q: 

fYhY2,Y3|q,X  (yl,  y2,  y3  |  q,  x)  = 
fYi|q,X  (yl  |  r|,  x)  fY2|q,X  (y2  |  q,  x) 
fY3|q,X  (y3  |  q,  x) . (4) 

Hence  the  data  distribution  function  relates  to  the 
densities  of  interest  as  follows:  Z 

fYi,Y2,Y3|X 

(yl,  y2,  y3  |  x)fYi|q,X  (yl  |  q,  x)  fY2|q,X  (y2  |  q, 
x)  fY 3|q,X  (y3  |  q,  x) 

Xfuix  (q  I  x)  dq . (5) 


The  goal  is  the  identification  of  fyii^x  ,  fv2|ri,x  ,  fy^x 
and  f||x  given  knowledge  of  fyi,Y2,Y3|x  •  The  setting  of 
equation  (5)  is  formally  equivalent  (conditional  on  x) 
to  the  instrumental  variables  setup  of  Hu  and 
Schennach  (2008),  for  nonclassical  nonlinear  errors- 
in-variables  models.  Specifically,  according  to  Hu  and 
Schennach’ s  terminology  Y3  would  be  the  outcome 
variable,  Y2  would  be  the  mismeasured  regressor,  Yi 
would  be  the  instrumental  variable,  and 

q  would  be  the  latent,  error-free  regressor.  We 
closely  rely  on  their  analysis  and  make  the  following 
additional  assumptions. 

3.  REQR  estimation 

This  section  considers  estimation  in  the  static  model 
(6)-(7).  We  start  by  describing  the  moment 
restrictions  that  our  estimator  exploits,  and  then 
present  the  sequential  estimator.  In  the  next  two 
sections  we  will  study  the  asymptotic  properties  of  the 
estimator  and  discuss  implementation  issues  in  turn. 

3.1.Moment  restrictions 

The  check  function  pT  ,  which  is  familiar  from  the 
quantile  regression  literature  (Koenker  and  Basset, 
1978):  px  (u)  =  (x  -  1  {u  <  0})  u,  and  \|/x  (u)  =  Vp(u). 
Let  also  W;t  (q)  =  (X;t,  q) . 

In  order  to  derive  the  main  moment  restrictions,  we 
start  by  noting  that,  for  all  x  6  (0,  1),  the  following 
infeasible  moment  restrictions  hold,  as  a  direct 
implication  of  Assumptions  1 

Indeed,  (6)  is  the  first-order  condition  associated  with 
the  infeasible  population  quantile  regression  of  Y ;t  on 
Xit  and  q,.  Similarly,  (5)  corresponds  to  the  infeasible 
quantile  regression  of  q,  on  X,. 

4.  CONCLUSION 

Random-effects  quantile  regression  (REQR)  provides 
a  flexible  approach  to  model  nonlinear  panel  data 
models.  In  our  approach,  quantile  regression  is  used 
as  a  versatile  tool  to  model  the  dependence  between 
individual  effects  and  exogenous  regressors  or  initial 
conditions,  and  to  model  feedback  processes  in 
models  with 

and  2:  "  t=i  Wit  (qi)  \|/x 

Yit  -  Wit  (qi)'  0 

(x)  #=  0,  (6) 
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E  [X;\|/T  (r|i  -  = 

Xi'S(T))]  0.  (7) 

Predetermined  covariates.  The  empirical  application 
illustrates  the  benefits  of  having  a  flexible  approach  to 
allow  for  heterogeneity  and  nonlinearity  within  the 
same  model  in  a  panel  data  context. 

The  analysis  of  the  asymptotic  properties  of  the 
REQR  estimator  requires  an  approxima-tion 
argument.  However,  while  our  consistency  proof 
allows  the  quality  of  the  approximation  to  increase 
with  the  sample  size,  at  this  stage  in  our 
characterization  of  the  asymptotic  distribution  we 
keep  the  number  of  knots  L  fixed  as  the  number  of 
observations  N  increases.  Assessing  the  asymptotic 
behavior  of  the  quantile  estimates  as  both  L  and  N 
tend  to  infinity  is  an  important  task  for  future  work. 

Lastly,  note  that  our  quantile-based  modelling  of  the 
distribution  of  individual  effects  could  be  of  interest  in 
other  models  as  well.  For  example,  one  could  consider 
semiparametric  likelihood  panel  data  models,  where 
the  conditional  likelihood  of  the  outcome  Y;  given  X, 
and  r\i  depends  on  a  finite-dimensional  parameter 
vector  a,  and  the  conditional  distribution  of  ip  given 
X;  is  left  unrestricted.  The  approach  of  this  paper  is 
easily  adapted  to  this  case,  and  delivers  a 
semiparametric  likelihood  of  the  form:  Z  f  (yi|x;;  a, 

5(0)  =  f  (yi|xi,  if;  a)f  (r|i|xi;  5(  ))dr|i, 

where  5(0  is  a  process  of  quantile  coefficients. 

As  another  example,  our  framework  naturally  extends 
to  models  with  time- varying  un-observables,  such  as: 

Yit  =  QY  (Xit,  qit,  Uit) , 

it  =  Qr|  r|i,t-l,Vit  , 

Where  U;t  and  V;t  are  i.i.d.  and  uniformly  distributed. 
It  seems  worth  assessing  the  usefulness  of  our 
approach  in  these  other  contexts. 
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